18 research outputs found
Survey on Causal-based Machine Learning Fairness Notions
Addressing the problem of fairness is crucial to safely use machine learning
algorithms to support decisions with a critical impact on people's lives such
as job hiring, child maltreatment, disease diagnosis, loan granting, etc.
Several notions of fairness have been defined and examined in the past decade,
such as, statistical parity and equalized odds. The most recent fairness
notions, however, are causal-based and reflect the now widely accepted idea
that using causality is necessary to appropriately address the problem of
fairness. This paper examines an exhaustive list of causal-based fairness
notions, in particular their applicability in real-world scenarios. As the
majority of causal-based fairness notions are defined in terms of
non-observable quantities (e.g. interventions and counterfactuals), their
applicability depends heavily on the identifiability of those quantities from
observational data. In this paper, we compile the most relevant identifiability
criteria for the problem of fairness from the extensive literature on
identifiability theory. These criteria are then used to decide about the
applicability of causal-based fairness notions in concrete discrimination
scenarios
Machine learning fairness notions: Bridging the gap with real-world applications
Fairness emerged as an important requirement to guarantee that Machine
Learning (ML) predictive systems do not discriminate against specific
individuals or entire sub-populations, in particular, minorities. Given the
inherent subjectivity of viewing the concept of fairness, several notions of
fairness have been introduced in the literature. This paper is a survey that
illustrates the subtleties between fairness notions through a large number of
examples and scenarios. In addition, unlike other surveys in the literature, it
addresses the question of: which notion of fairness is most suited to a given
real-world scenario and why? Our attempt to answer this question consists in
(1) identifying the set of fairness-related characteristics of the real-world
scenario at hand, (2) analyzing the behavior of each fairness notion, and then
(3) fitting these two elements to recommend the most suitable fairness notion
in every specific setup. The results are summarized in a decision diagram that
can be used by practitioners and policymakers to navigate the relatively large
catalog of ML
Survey on Causal-based Machine Learning Fairness Notions
Addressing the problem of fairness is crucial to safely use machine learning algorithms to support decisions with a critical impact on people's lives such as job hiring, child maltreatment, disease diagnosis, loan granting, etc. Several notions of fairness have been defined and examined in the past decade, such as, statistical parity and equalized odds. The most recent fairness notions, however, are causal-based and reflect the now widely accepted idea that using causality is necessary to appropriately address the problem of fairness. This paper examines an exhaustive list of causal-based fairness notions, in particular their applicability in real-world scenarios. As the majority of causal-based fairness notions are defined in terms of non-observable quantities (e.g. interventions and counterfactuals), their applicability depends heavily on the identifiability of those quantities from observational data. In this paper, we compile the most relevant identifiability criteria for the problem of fairness from the extensive literature on identifiability theory. These criteria are then used to decide about the applicability of causal-based fairness notions in concrete discrimination scenarios
Identifiability of Causal-based Fairness Notions: A State of the Art
Machine learning algorithms can produce biased outcome/prediction, typically,
against minorities and under-represented sub-populations. Therefore, fairness
is emerging as an important requirement for the large scale application of
machine learning based technologies. The most commonly used fairness notions
(e.g. statistical parity, equalized odds, predictive parity, etc.) are
observational and rely on mere correlation between variables. These notions
fail to identify bias in case of statistical anomalies such as Simpson's or
Berkson's paradoxes. Causality-based fairness notions (e.g. counterfactual
fairness, no-proxy discrimination, etc.) are immune to such anomalies and hence
more reliable to assess fairness. The problem of causality-based fairness
notions, however, is that they are defined in terms of quantities (e.g. causal,
counterfactual, and path-specific effects) that are not always measurable. This
is known as the identifiability problem and is the topic of a large body of
work in the causal inference literature. This paper is a compilation of the
major identifiability results which are of particular relevance for machine
learning fairness. The results are illustrated using a large number of examples
and causal graphs. The paper would be of particular interest to fairness
researchers, practitioners, and policy makers who are considering the use of
causality-based fairness notions as it summarizes and illustrates the major
identifiability resultsComment: arXiv admin note: text overlap with arXiv:2010.0955
Identifiability of Causal-based Fairness Notions: A State of the Art
Machine learning algorithms can produce biased outcome/prediction, typically, against minorities and under-represented sub-populations. Therefore, fairness is emerging as an important requirement for the large scale application of machine learning based technologies. The most commonly used fairness notions (e.g. statistical parity, equalized odds, predictive parity, etc.) are observational and rely on mere correlation between variables. These notions fail to identify bias in case of statistical anomalies such as Simpson's or Berkson's paradoxes. Causality-based fairness notions (e.g. counterfactual fairness, no-proxy discrimination, etc.) are immune to such anomalies and hence more reliable to assess fairness. The problem of causality-based fairness notions, however, is that they are defined in terms of quantities (e.g. causal, counterfactual, and path-specific effects) that are not always measurable. This is known as the identifiability problem and is the topic of a large body of work in the causal inference literature. This paper is a compilation of the major identifiability results which are of particular relevance for machine learning fairness. The results are illustrated using a large number of examples and causal graphs. The paper would be of particular interest to fairness researchers, practitioners, and policy makers who are considering the use of causality-based fairness notions as it summarizes and illustrates the major identifiability results in a single document
Identifiability of Causal-based ML Fairness Notions
International audienceMachine learning algorithms can produce biased outcome/prediction, typically, against minorities and under-represented sub-populations. Therefore, fairness is emerging as an important requirement for the large scale application of machine learning based technologies. The most commonly used fairness notions (e.g. statistical parity, equalized odds, predictive parity, etc.) are observational and rely on mere correlation between variables. These notions fail to identify bias in case of statistical anomalies such as Simpson's or Berkson's paradoxes. Causality-based fairness notions (e.g. counterfactual fairness, no-proxy discrimination, etc.) are immune to such anomalies and hence more reliable to assess fairness. The problem of causality-based fairness notions, however, is that they are defined in terms of quantities (e.g. causal, counterfactual, and path-specific effects) that are not always measurable. This is known as the identifiability problem and is the topic of a large body of work in the causal inference literature. The first contribution of this paper is a compilation of the major identifiability results which are of particular relevance for machine learning fairness. To the best of our knowledge, no previous work in the field of ML fairness or causal inference provides such systemization of knowledge. The second contribution is more general and addresses the main problem of using causality in machine learning, that is, how to extract causal knowledge from observational data in real scenarios. This paper shows how this can be achieved using identifiability
Finding a Needle in a Haystack: The Traffic Analysis Version
Traffic analysis is the process of extracting useful/sensitive information from observed network traffic. Typical use cases include malware detection and website fingerprinting attacks. High accuracy traffic analysis techniques use machine learning algorithms (e.g. SVM, kNN) and require to split the traffic into correctly separated blocks. Inspired by digital forensics techniques, we propose a new network traffic analysis approach based on similarity digest. The approach features several advantages compared to existing techniques, namely, fast signature generation, compact signature representation using Bloom filters, efficient similarity detection between packet traces of arbitrary sizes, and in particular dropping the traffic splitting requirement altogether. Experimental results show very promising results on VPN and malware traffic, but low results on Tor traffic due mainly to the single-size cells feature
(Local) Differential Privacy has NO Disparate Impact on Fairness
Best Paper AwardInternational audienceIn recent years, Local Differential Privacy (LDP), a robust privacy-preserving methodology, has gained widespread adoption in realworld applications. With LDP, users can perturb their data on their devices before sending it out for analysis. However, as the collection of multiple sensitive information becomes more prevalent across various industries, collecting a single sensitive attribute under LDP may not be sufficient. Correlated attributes in the data may still lead to inferences about the sensitive attribute. This paper empirically studies the impact of collecting multiple sensitive attributes under LDP on fairness. We propose a novel privacy budget allocation scheme that considers the varying domain size of sensitive attributes. This generally led to a better privacyutility-fairness trade-off in our experiments than the state-of-art solution. Our results show that LDP leads to slightly improved fairness in learning problems without significantly affecting the performance of the models. We conduct extensive experiments evaluating three benchmark datasets using several group fairness metrics and seven state-of-the-art LDP protocols. Overall, this study challenges the common belief that differential privacy necessarily leads to worsened fairness in machine learning